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Abstract. The aim of this paper is to recover the regression function with sup 
norm loss. We construct an asymptotically sharp estimator which converges with 
the spatially dependent rate 

r„, M (x) = P(logn/(n / i( a; ))) s/(2s+1) , 

where [t, is the design density, s the regression smoothness, n the sample size and 
P is a constant exp ressed in terms of a solution to a problem of optimal recovery 
as in lDonohol jl994[) . We prove this result under the assumption that /x is positive 
and continuous. This estimator combines kernel and local polynomial methods, 
where the kernel is given by optimal recovery, which allows to prove the result up 
to the constants for any s > 0. Moreover, the estimator does not depend on fi. 
We prove that r n>fl (a;) is optimal in a sense which is stronger than the classical 
minimax lower bound. Then, an inhomogeneous confidence band is proposed. 
This band has a non constant length which depends on the local amount of data. 



1. Introduction & main results 
1.1. The model. Suppose we observe (X{, Yi), 1 ^ i ^ n, from 

Y i = f(X i ) + £ ii (l.i) 

where £j are i.i.d. centered Gaussian with variance a 2 and independent of Xj, 
with Xi i.i.d. with density \x on [0, 1], which is bounded away from 0. We want to 
recover /. In this model, when \i is not the uniform law, we say that the information 
is spatially inhomogeneous. 



1.2. Methodology. There are several ways to assess the quality of an estimation 
procedure. A first approach is local: we focus on recovering / at a fixed point 
xq £ [0,1]. Over a function class S, the minimax risk is given by 

K n (Z,x ) = mf supE?{|/ n (x ) - f(x )\}, 



Date: 2nd February 2008. 

2000 Mathematics Subject Classification. 62G05, 62G08, 62G15. 

Key words and phrases, random design, sharp estimation, inhomogeneous data, nonparametric 
regression. 

1 



2 



S. GAIFFAS 



where the infimum is taken among all estimators. We say that p n ( x o) > is the 
minimax convergence rate at xq if 

n . K n (E,x Q ) TZ n {Y,,x ) 

U < limmr — r — ^ iimsup — - — < +00. 

n Pn(^o) n Pn(Xo) 

In this paper, we are interested in recovering / globally. We consider the loss with 
sup norm defined by ||g||oo = su Paie[o,l] |5( X )I- I n this case, the minimax risk is 

Kn(Y.) =infsupE^{||/ n -/|| 00 }, (1.2) 

fn f& 

and we say that ip n is the minimax convergence rate if 

• f gn(g) ^ r gn(g) . 
< hmmf ^ Iimsup < +00. 

An advantage of this norm is that it is exacting: it forces an estimator to behave 
well at every point simultaneously. In the regression model (|1.1|) with E a Holder 
ball with smoothness s > , we h ave when fi is positive and bounded that ip n X 
(logn/n) s /( 2s+1 ) fsee IStond (|l982l ^. where a n x b n means < liminf n a n /b n ^ 



limsup n a n /6 n < +00. 

However, when [i is positive and bounded, ip n is not sensitive to the variations in 
the amount of data. An improvement is to consider instead of (jl.2|) the spatially 
dependent risk 

supE^j sup r n (x)- 1 \f n (x) - f(x)\}, 
/es xelo.l] 

where f n is some estimator and r n (-) > a family of spatially dependent normal- 
isation factors. If this quantity is bounded as n goes to infinity, we say that r n {-) 
is an upper bound over E. If we look for such upper bounds, we clearly find that 
r n (x) X il'n for any x, thus we must sharp this upper bound up to constants. Here, 
we consider indeed the latter approach in the asymptotic minimax context. In this 
paper, we develop the consequences of inhomogeneous data within this framework. 

1.3. Upper and lower bounds. If s,L > 0, we define the Holder ball E(s,L), 
which is the set of all the functions / : [0, 1] — ► R such that for any x, y £ [0, 1], 

\f (k) (x)-fW(y)\ ^L\x-y\ s - k , 

where k = [s\ is the largest integer k < s. If Q > 0, we denote by YP(s,L) the set 
of functions / G E(s,L) such that ||/||oo ^ Q, and we denote simply E = Y,Q(s,L). 
All along this study, we suppose: 

Assumption D. For some < v ^ 1 and g,q > 0, we have 

H € E(i/, q) and /i(x) ^ q, for all x £ [0, 1]. 

In the following, a loss function w(-) is any non negative and nondecreasing func- 
tion such that w{x) ^ A{1 + \x\ b ) for some A, b > (an example is w(-) = \ ■ \ p for 
p > 0). Let us consider 

/ logn \*/(2s+i) 



rn ^ x) = \Mx)) (L3) 
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We denote by the integration with respect to the joint law Pj^ of the observa- 
tions (Xi,Yi), 1 ^ i ^ n. Our first result shows that r„ iA1 (-) is, up to the constants, 
an upper bound over S. 

Theorem 1 (Upper bound). Under assumptions^ if fn is the estimator defined in 
section^ we have for any s, L > 0, 

Urn sup sup Jw( sup r^x)- 1 ^) - f(x)\)} < w(P), (1.4) 
n /es ^ x6[0,l] 

where 

p = a 2s/(2s + l) L l/(2s + l) ^ (0) (_J_y /(2S+1) (L5) 

and ip s is defined as the solution of the optimisation problem 

if s = argmax <p(0), (1-6) 

^eS(s,l;K), 
IM|2<1 

where S(s, L;R) is i/te extension ofY>(s,L) to the whole real line. 

In the same fashion as in lDonohcl ll994V the constant P is defined via the solution 
of an optimisation problem which is connected to optimal recovery. For further 
details, see in sections El and E The next theorem shows that r ntfl (-) is indeed 
optimal in an appropriate sense. In what follows, the notation |/| stands for the 
length of an interval I. 

Theorem 2 (Lower bound). Under assumvtionWi if I n C [0, 1] is any interval such 
that for some e 6 (0, 1), 

\I n \ n £/{2s+l) -> +oo as +oo, (1.7) 

we have 

liminf inf supE? {w( sup r^ixT^Uix) - f{x)\)} > w((l - e)P), 
n f n /es xei„ 

where P is given by (jl.5|) and the infimum is taken among all estimators. A conse- 
quence is that if I n is such that (|1.7|) holds for any e G (0, 1), we have 

liminf inf supE? {w{ sup r n>tl (x)- l \f n {x) - /(z)|)} ^ w(P). (1.8) 

This result is discussed in details in section ITU Now, we construct a confidence 
band which is adapted to inhomogeneous data. Indeed, its length varies depending 
on the local amount of data. 

1.4. An inhomogeneous confidence band. We define the empirical design sam- 
ple distribution 



IJ-n 

n 

8=1 
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where 5 is the Dirac mass, and for h > 0, x G [0, 1], we consider the intervals 

I [x — n,x\ when 1/2 < x ^ 1. 

The choice of non symmetrical intervals allows to skip boundaries effects. Then, we 
define the "bandwidth" at x by 

H n (x) 4argmin(v ^ (__£|Zi_) 1/2 \ (i.io) 
he [o,i] I \nfj, n {I{x,h))y J 

which makes the balance between the bias and the variance of a certain kernel 
estimator (more in section |3] below). We consider the sequence of points 

x 3 = jA n , A n = (lognr^+^n- 1 /^!), (1.11) 

for j G J n = {0, . . . , [A" 1 ]} where [a] is the integer part of a with xm„ = 1, M n = 
\J n \ (the notation \A\ stands also for the size of a finite set ^4). If x £ [xj, Xj + i), we 
define 

R n {x) = H n (xj) s , 
and for any x G [0, 1], /3 > 0, we consider the band 

C n ,/3(x) = - (l + P)PR n (x) i J n (x) + (I + (3)P R n (x)], (1.12) 

where P is defined by Q1.5JI . The next proposition provides a control over the 
coverage probability of this band, uniformly over [0, 1]. 

Proposition 1. Given a confidence level a G (0,1), C n ^ with 

log(l/q) y/2 
.DcOogra) 2 ^ 2 ^ 1 ). 

[where D c is some positive constant), is under assumption{^ a confidence band of 
asympotic level 1 — a, namely: 

inf P£ { /(x) G C^x), /or a// x € [0, 1] } > 1 - a, (1.13) 
/es 

/or n £an?e enough. Moreover, we have for any x G [0, 1], 

SUpE? J\CnA X )\}/ r nA x ) ~^ 2P asn ^ +°°- (1-14) 

In figures ^ an d 121 we give a numerical illustration of this confidence band. We 
consider the function f(x) = 0.3(1 — \x — 0.5|/0.3)+, where o+ = max(a, 0). The 
first dataset is simulated with an uniform design and the second dataset with design 
density ^(x) = 0.05 + 11.4|x — 0.5 1 2 . In this example s = L = 1, the sample size is 
n = 500 and the root-signal-to-noise ratio is 7. 

When the data is homogeneous (uniform design), the length of the confidence 
band is almost constant, see figure ^ in the non-uniform case, the band is confined 
at the boundaries of [0, 1] and more spaced at the middle, see figure |2j 



a M \ ( lo g(V a ) V 
/3 = /3(n,a) = ( J 
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Figure 1. Confidence band with homogeneous data. 





Figure 2. Confidence band with inhomogeneous data. 



1.5. Outline. The remainder of the paper is organised as follows. In section [2] we 
discuss our results in details and compare them with former results. In section |31 we 
construct the estimator used in theorem ^ The proofs are delayed until sections ^ 
and 03 In section EJ we recall some well known facts on optimal recovery, which are 
useful for the construction of our estimator and for the proofs. 

2. Discussion 

2.1. Motivation. In most cases, the models considered in curve estimation do not 
allow situations where the data is inhomogeneous, in so far as the amount of data is 
implicitly assumed constant over space (or time). However, an increasing literature 
works in models where the data can be inhomogeneously distributed. Recent results 
deal with the estimation of the regression function when the obser v ation points are 
not equispaced or random, see f o r instan c e lAntoniadis et al. l (|-1997MBrown and Cail 
(1998). IWong and Zhenj (|2r)02f ) . iMaximl (feOO.'j ). among others. The estimators pro- 
posed in these papers present good minimax properties, but the results are always 
stated in a way in which the following basic principle does not appear: an estimator 
shall behave better at a point where there is much data than where there is little 
data. For instance, upper bounds are usually stated with the minimax rate, which 
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is not sensitive to the variations in the local amount of data nor to the information 
distribution in the considered model. 

At this stage, it is also natural to look for confidence bands when the data is 
inhomogeneous, and especially distributed with an unknown density. Following the 
above principle, a striking question is that of the construction of a confidence band 
with a length which depends on the local amount of data: such a band should be 
more confined where there is much data than where there is little data. The aim of 
this paper is to develop this new approach. 



2.2. L iterature. When the design is equidistant, that is X{ = i/n, we know from lKorostelev 
( 1993) the exact asymptotic value of the minimax risk for sup norm error loss. If 



,'lognW( 2s+1 ) 

Wn 



n 

we have for any < s ^ 1 and X = S(s, L) 



lim inf supE/jw^ 1 !!^ - /||oo)} = w(C), 



C = a^^L^U S -±±Y /{2S+1 \ (2.1) 



where 

r _ ^28/(28+1) r.V(2*+i) ( £j 

. 2s 2 , 

This result was the first of its kind for sup norm error loss. The exact asymptotic 
val ue of the mini max risk was only known for square integrated norm error loss, 
see 



Pinskerl (l980). 



In the white noise model 

dY t n = f(t)dt + n~ l / 2 dW u t e [0, 1], (2.2) 
where W is a standard Brownian motion, Donohol ( 19941 ) extends the result bv lKorostelev 



(1993) ;o any s > 1. In this paper, the author makes a link between statistical sup 



norm estimation and the theory of optimal recovery (see section EJ) . It is shown for 
any s > and S = S(s, L) that the minimax risk satisfies 

lim infsu P E / {^- 1 ||7„-/||oo} =w(P 1 ), (2.3) 

where Pi i s given b y (ll.5|) with a = 1. When s £ (0, 1], we have P = C, see for 
instance in iLeonovl (|1997l ). 

Since the results by Korostelev and Donoho, many other author s worked on the 



proble m of sharp estimation (or testing) in su p norm. On testing, see lLepski and Tsvbakov 
( 200c|) , see lKorostelev and Nussbauml (|l999h for density estimation and lBertinl (|2004ah 
for white noise in an anisotropic setting. 

While most papers on sha r p estim ation work in models with homogeneous infor- 



mation, the paper by iBfirtinj (|2QQ4fl|) works in the model of regression with random 



design (|l.lj) . When /j, satisfies assumption ID1 and S = S^(s,L) for < s ^ 1, it is 
shown that 

lim inf supE^{^(^||7 n - f^)} = w{C), (2.4) 
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where C is given by (|2.1j) and 

logn y/Cis+i) 
n inf x n(x) 



(2.5) 



Note that the rate v n ^ differs from (and is larger than) ijj n when \x is not uniform. A 
disappointing fact is that u n „ depends on fi only via its infimum, which corresponds 
to the point in [0, 1] where we have the least information. This rate does not take 
into account the regions with more data. It seems natural to wonder if we can 
improve this result, namely: can we replace inf \x by n(x) ? Note that in section^ 
we have answered positively to this ques tion. 

In this paper, we extend the result by IDonohol (Il994l'l to the m odel of regression 



with random design and we improve the result bv iBertinl (l2004ch in several ways: 



our result holds for any s > 0, we construct an estimator which does not depend 
on /i, and when the design is not uniform, our convergence rate ^n,/i 

(•) is better 

(smaller) than v n ^ at the order of constants. More importantly, this rate is adapted 
to the local amount of information of the model. 



2.3. About theorem [H We ca n un derstand the r esult of theorem H] heuristicallv. 
Following Br own and Low (1996) and lBrown et alJ (|2002l ) we can find an "idealised" 
statistical experiment which is equivalent (in the sense that the LeCam deficiency 
goes to 0) to the model The model Ql.ljl is clearly equivalent to 

Y i = f(G^(Ui))+^ l<*<n, 

with independent and uniform U{ where G,,.(x) = u (t)dt. Under appropriate con- 
ditions on / and we know from B rown et al. (2002) that this model is equivalent 
to 



a 



dZ? = f(G~ l (t))dt + -=dW u t G [0, 1] 



where W is a Brownian motion. Informally, if [i is known we obtain by the time 
change t = G^u), 



dZ™ = f(u)fi(u)du + a 



H{u) 



n 



dW u , «€[0,1], 



where Z u = Zq and W is a Brownian motion. Finally, we obtain that (jl.lj) is 
equivalent to the heteroscedastic white noise model 



dr: = f( u )du + 



a 



-.dB u , ue[0,l], 



(2.6) 



where B is a Brownian motion. In view of the result by IDonohol (I1994T ) (see (1231) ) 
which is stated in the model (|2.2j) and comparing the noise levels in the models (12.21) 
and ()2.6|) (with a = 1) we can explain informally that our rate r ntfJi (-) comes from 
the former rate ij) n where we replace n by nfi(x). 
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2.4. About theorem |21 From Bertin (2004c), we know when s G (0, 1] that 



liminf inf sup l Jf n - f\U} > w(P), 

where v njA1 is given by ()2.5j) . An immediate consequence is 

liminf inf supE? {w( sup r^x)- 1 ^) - /(x)|)} ^ u>(P), (2.7) 

where it suffices to use r n ^(x) ^ v n ^ for any x G [0, 1]. This entails that r n ^{-) is 
optimal in the classical minimax sense, but this notion of optimality is weaker than 
ours. Indeed, to prove the optimality of r n ^{-) we need a more "localised" version 
of the lower bound, hence theorem |2j 

In theorem|2j if we choose I n = [0, 1] we find back (|2.7|) and if I n = [x— (logn) 7 , x+ 
(logn) 7 ] n [0, 1] for any 7 > and x G [0, 1] such that /i(x) / inf [0,1] ^{ x )i then 
obviously v n>fl does not satisfy (jl.8j) . 

2.5. About proposition ^ The confidence band C nj( g(-) is "design adaptive", in 
the sense that it does not depend on fi, but it depends on the smoothness of / 
via the parameters s and L. The constru ction of adaptive confidence bands is 



more involved. We know from iLowl (|1997h that the construction of an adaptive 



confidence band without extra assumption is not feasible. However, if extra as- 
sumptions on the smo othness of / are supposed, it is possible to constru c t such 
confidence bands, see JPicard and Triboulevl (|2000T ). iHoffmann and Leoskil §002) 



and ICai and Low ( 2004a, bj). Here, we only focus on the inhomogeneous aspect 
of the confidence band. Adaptation with respect to the smoothness is beyond the 
scope of this study, and we would encounter the same limitations. 

2.6. About assumption IPl In assumption El M i s supposed to be bounded from 
below, and from above since it is conti nuous oyer [0, 1]. When \x is vanishing or ex- 
ploding at a fixed point, we know from Gaiffc3 (|2004h that a wide range of pointwise 



minimax rates can be achieved, depending on the behaviour of [i at this point. In 
this case, we expect the optimal space dependent convergence rate (whenever it ex- 
ists) to be different from the classical minimax rate if) n not only up to the constants 
but in order. 



3. Construction of an estimator 

3.1. Main idea. The estimator f n described below is using both kernel and local 
polynomial methods. Its construction is divided in two parts: first, at the discretisa- 
tion points Xj defined by Ijl.llJ) , we use a Nadaraya- Watson estimator with a design 
data driven bandwidth. This part of the estimator is used to attain the minimax 
constant. Between the discretisation points, the estimator is defined by a Taylor 
expansion where the derivatives estimates are done by local polynomial estimation. 
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3.2. The estimator at points Xj. We consider the bandwidth H n (x) defined 
by (jl.lOj) and we define 



H. 



M 



maxHr,(xi), 



where Xj and J n are defined in section ITTH From iLeonov (1997, 1999h we know that 
the function cp s defined by ()1.6() is even and compactly supported. We denote by 
[—T S ,T S ] its support and r n = mm.{2c s T s H^ 1 ,5 n ) where S n = (logn)~ l and 



(7\2/(2«+l) 



(l) 



l/(2s+l) 



(3-1) 

As usual with the estimation of a function over an interval, there is a boundary 
correction. We decompose the unit interval into three parts [0, 1] = J ri: \ U J n< 2 U J n ^ 
where J n>1 = [0,r n ], J„ i2 = [r n , 1 - r n ] and J nj3 = [1 - r„,l]. We also define 
Ja,n = {j\ x j £ Ja,n} for a G {1,2,3}. If is defined by (|1.6|) . we consider the 
kernel 



A" 



(3.2) 



The "sharp" part of the estimator is defined as follows: at the points Xj, we define 
In by 



fn {Xj 



1 



X>A. 



nH n (xj) \c s H n (xj) 



Aj X i 



max 



nH n (xj) \c s H n (xj) 



Aj Xj 



, fn{Xj) 



if j G ^2,n, 



if J € Ji,„U J3,n- 



(3.3) 



This estimator is (up to the correction near the boundaries) a Nadaraya- Watson 
estimator with the optimal kernel K s and a bandwidth adjusted to the local amount 
of data. The boundary estimator f n is defined below. 

3.3. Between the points xj — local polynomial estimation. We recall that 
k = [s\ where s is the smoothness of the unknown signal /. For any interval 
/ C [0,1], we define the inner product 



(f,9)i 



A*n(-0 



fgd/j, n , 



where Jj f dp, n = Ylx-el fi^i) /n. If I = I(x. h) - see 1)1. 9|) - for some x G [0. 1] and 
h > 0, we define 4>i, m (y) = (y — x) m and we introduce the matrix X/ and vector Yj 
with entries 

(X/) P)5 = <j>i >q )i and (Y/) p = (Y, 0j lP )/, 

for ^ p, g ^ fc. Let us define 
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where Q, n j = |A(Xj) ^ 1/ y/nfi n (I)} and A(M) is the smallest eigenvalue of a 
matrix M and is the identity matrix on Note that the correction term 

in X/ entails A(Xj) ^ 1/ 'y/np, n (I). When j2 n (I) > 0, the solution 6j of the system 

X/0 = Y 7j 

is well defined. If p, n (I) = 0, we take Qj = 0. Then, for any 1 ^ m ^ k, a natural 
estimate of f^ m '{xj) is 

^(, j )4 m !(?f (ljiM ) ra , 

where 

h n = (a/L) 2 ^ s + 1 \logn/n) 1 ^ 2s+1 \ 
and the estimator at the boundaries of [0, 1] is given by 

fn(Xj) = (0l( Xj ,t n ))o, 

where t n = (a/L) 2 ^ 2s+1 ^n~ 1 ^ 2s+1 h Note that the boundary estimator is a local 
polynomial estimator with the pointwise bandwidth of estimation t n . If we define 

r n J = { min ||0 /ni ||/ ^ — ), (3.4) 
where || • ||| = (• , •)/, then for x £ [xj,xj + i), j G J n , we take 

U(x) 4 /„(*,.) + ( £ ^p(s - ^) m ) lrv ( ^ nr (3.5) 

m=l 

4. Proof of theorem 1 and proposition 1 

The proof of theorem ^ needs several preliminary results. In section H~T1 we state 
the most important lemmas while section |4~21 is devoted to useful results concerning 
local polynomial estimation. We delay the proofs of these lemmas until section H~H 
since they can be skipped in a first reading. The proofs of theorem ^ and proposi- 
tion □ are given in section [Ql We define the risk 

£ n j= SUp rn^X^lfnix) - f(x)\, 

xe[o,i] 

and the discretised risk £^ f = sup JgJn r n ^(x j )~ 1 \f n (x j ) - f(xj)\. 

In the following, the notation o(l) stands for a deterministic and positive quan- 
tity going to as n — > +oo indepedent of / while 0(1) stands for a quantity 
bounded by a positive quantity independent of /. If A is non negative, we also 
define O(A) = 0(1) x A. We denote a V b = max(a, b) and a A b = min(a, b). We 
consider the norms ||g||oo = su Pxe[o,i] bC 2 ')!' IMI2 = (Jq g 2 (x)dx) 1 ^ 2 , and H^Hoo = 
max ^ m ^ fc \x m \, \\x\\ 2 = (Eo^ x m) 1/2 when x G ^ k+1 - 

Since p, n (I(x, h))/h is close to fi(x) in probability, we have that H n (x) is close to 



hnAX) -\n^(x)) 
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To avoid overloaded notations, it is convenient to write K instead of K s and to 
introduce for j G J n -, 

Hj = H n (xj), hj = h n ^{xj), fjjj = fiyxjj, Tj = r n ,n{xj)i 

and = nc s hjfij, qj = nc s Hjfij where c s is given by (jH.ljl . We denote by X n the 
sigma algebra generated by the observations X{, 1 ^ % ^ n. 

4.1. Preparatory results. We define 

n 
i=l 

where L\ is a positive constant, and 

n 

B n,j = {|(E^M')M' -!| < ^n}, C nJ 4 {1^/^ - IK ^n}, 
i=l 
n 

e«j = {KE^mOM- - 11*1111 < « A1 }, 

i=l 

where L2 is a fixed positive constant and 

B n = f] (A nj n B nj n E nj ) n f] C n>j . (4.1) 

A control over the probability of this event is given in lemma[7|below. Let us denote 
Z n = maxj(zj 2 n \Z n j | where Z n j = tJ Y^=i^,i^i,j- Informally, the variable Z n 
corresponds to the variance term of £^ f . We recall that M n is equal to the cardinal 

Of J n . 



Lemma 1 (variance term). For any e > 0, 

sup F} {Znl^ > (1 + e)Lc s s \\K\\ 2 } ^ 2(log 

/6E«(s,i) 

Proof. Conditionally on % n , ^n } j is centered Gaussian with variance 



N 2s /(2 S +l) n -e/(2 S +l)_ 



-VE^- 

i=i 

On B n , we have for any j € and n large enough 

n y^n f>2 11^1,2 II ^-112 2 

E = 7x^Vl2 < (1 + 0(1))^ = (1 + °( 1 ))^T A -- 
^ (Ei=i K ij) 2 Qj c s logn 
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where we used the definition of h n (x), thus v 2 ^ (1 + e)cr 2 1| -F^Hl/ ( c s logn). Using 
the standard Gaussian deviation, we obtain 

F n f J\Z n;j \l Bn >(l+e)Lc s s \\K\\ 2 } 

( (1 + e)L 2 c 2 s s+l , 
<2exp(^ j-^ logn 

= 2exp ( - ^-±£l i ogn ) = 2„-(i4*)/(3-+D i 

and bounding from above the probability of Uj^j 2 n {\Z ni j\l Bn > (1 + e)Lc s s \\K H2} 
by the sum of the probabilities, and since \J2,n\ ^ M n ^ (log n) 2s ^ 2s+l ^n l ^ 2s+1 \ 
the lemma follows. □ 



For any j £ J n ,2, we define 
b n r = max 

where b nJd = E n f4i {B nJJ l Bn }, U nJJ = B nJ>j - b nJJ and 



b n r = max I b n f ,■ I and U n t = max \U n t ,• I , 



i=l 

The quantities 6 n j and U n j correspond to bias terms of the risk S^f- 
Lemma 2 (first bias term). We have 

limsup sup b n j ^ Lc s s B(s, 1), 

n /eS(s,L) 

where B(s,L) is defined by (|A.2[) . 

Lemma 3 (second bias term). There is a constant Djj > suc/i t/iai for any £ > 0, 

sup P?„{y„,/lfl n > 4 < exp ( - D u e{lf\e)n 2s ^ 2s+ V). 
/eS(s,L) ' 

The proofs of these lemmas are delayed until section l4~H 

4.2. Local polynomial estimation. In this section we give results concerning 
local polynomial estimation. This well known estimation procedure provides an 
efficient method for recovering both a function and its derivatives. The lemma 0] 
below is one version of the bia s variance decomposition of the local polynomial 
estimator, w h ich is class i cal: s ee Korostelev anc Tsvbakovl (ll993MFan and Giibelsi 
( 199.1 \l99(j ). ISpokoinvl <ll998h and iTsvbakovl (1200.^ 1 . among many others. To a 
vector 9 S R fc+1 we associate the polynomial 

p e (y) = e + e iy + --- + e k y k . 

If Qi is the solution of the system ~Kj6 = Yj (see section for I = I(x,h), we 
define fi(y) = Pq {y — x )- If Vl,k = Span{0/ jm ; ^ m ^ k}, we note that on Q n j, 
fi satisfies 

(fj , 0), = (Y , 4>) u V0GF 7 , fc . (4.2) 
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By definition, we have fn [xj) = fj^. ^ )( x j)i where ff n> is the derivative of order 

m of //, and f n {xj) = fi( Xjttn ){xj 
matrix At with entries 



7{m) 



), see section 13.31 We introduce the diagonal 



(A-l)m,m = \\4>I,m\\ I , 

for ^ m ^ k, where || • ||| = (• , •)/, the symmetrical matrix 

Qj 4 AjXjAj, 

where X/ is introduced in section 13.31 and Q the matrix with entries 

Xp+q 



p,q 



yJX2p X2q ' 



for sC p,q < k, where Xm = (1 + (-l) m )/(2(m+ 1)). It is easy to see that \{Q) > 
(we recall that X(M) is the smallest eigenvalue of a matrix M). We define the event 

where £l n j is defined in section 13.31 and 

where if I = 7(x, /i) for some i£ [0,1], /i > 0, 

£»,/ = {|A(&) - A(C?)| < u- 
For ^ m ^ 2/c an interval I C [0, 1] and 5 > 0, we define 

1 



and 



We define 



where 



2A- 



<Aj,m d/i n - Xr 



n 



Nn = P N nf/(jej)Afi) n P N n . 
jeJn jej„ 

fin(I(x,h)) 



P| ^n,m,/ (sjjtr 



I(xj,t n ) ' 



N 



n,I(x,h) 



n(x)h 



.}. 



(4.3) 



Finally, we introduce 

c n = n n nA,nP n nN„. 

A control on the probability of this event is given in lemma below. We recall that 
M n is the cardinal of J n . 
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Lemma 4. There exists a centered Gaussian vector W £ '^( k + 1 ) M « W nh 

®1»{W*} = 1, 0^p^(k + l)M n , 
such that on C n , one has for any ^ m ^ k and f £ S(s, L): 

m^\^( Xj )-f^(xj)\ ^ (l + o(l))CL/ l r m (l + (logn)- 1 / 2 ^ M ), (4.4) 



where 



W M = max |WJ, 

0<p^(fc+l)M n 



and C = C x ,m, q ,k where C x ,m, q ,k = X' 1 (G)(k + l)m\y/2m + l(l V <T 1/2 ). For t/ie 
estimator near the boundaries, we have for a = 1 and a = 3: 

max \ f n { Xj ) - f( Xj )\ < (1 + o(l))CZ<(l + I^M), (4.5) 

w/iere 

= max IWJ 

0^p<(A+l)|Ji,n| 

W (3) = max \WJ, 

(fc+l)([^l,n[ + [J8 1 nl)+l<P^(fc+l)Af« 

and C = CA,o, 3 ,fc- 

Lemma 5. For any interval I C [0, 1] and p > we have 

EUKWI*n} = OK /2 ). 

Moreover, for any 1 ^ m ^ k, we have on T n j (see section Wlfy 

® n fA\(9l)m\ P \3Zn} =0{pP). 

The proofs of these lemmas are delayed until section section 14.41 The following 
two lemmas are needed for the proof of theorem ^ 

Lemma 6. If w(x) ^ A(\ + \x\) b for some A,b > 0, we have 

snp E^{w 2 (£ n j)} = 0(n^/^). (4.6) 

/e£«(s,L) 

We define T n = (~\jeJrXn,i{xj,h n ) where T n j is defined by 1)3. 4 j) . The probability 
IP^ stands for the joint law of the X±, . . . , X n . 

Lemma 7. There exists an event A n £ X n such that for n large enough, under 
assumption^ 

F;{A C J ^ eM-DAn s/ ( 2s+1) ), (4.7) 

where > and 

A n cB n nC n n r n , (4.8) 

where B n is defined by (|4.1|) and C n is defined by l)4.3|) . 
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4.3. Proofs of the main results. The next proposition is a deviation inequality 
for the discretised risk S^f- This proposition is of special importance in the proof 
of theorem ^ and proposition ^ 



Proposition 2. There is Dg > such that for any e > 0, we have 

fA^J 



snp F n f J££ f l Ari >(l+e)P} 



f£T,Q(s,L) 

< exp ( - D £ e{l Ae)(logn) 2s /( 2s+1 )), (4.9) 

for n large enough. Moreover, 

sup El tl {w\£* f l An )} = 0(l). (4.10) 

/6E«(s,L) 

Proof. We decompose the risk into three parts 

= + + (4.11) 

where £^ = swpj^j a n r~ 1 \f n (xj) — f(xj)\. For a = 1 and a = 3, the quantity £^ 
is the risk at the boundaries of [0, 1]. Note that on B n , we have Yli=i ^i,j/( n ^j) > 
c s //j(l — Lj5* A ) > c s g(l — Li<5* A1 ) > (5 n for n large enough. Hence, since A n C £? n 
(see lemma EJ) we can decompose on the middle risk into bias and variance terms 
as follows: 

£ n,f < *W + ^/ + Z - ( 4 - 12 ) 
In view of lemma El we have for n large enough b n j ^ (1 + 2e)Lc s s B(s, 1) and using 
equation (|A.3|) we obtain 

{<> 2 l^>(l + 2e)P} 

C {Z n l Bn > (l+e)Lc s s \\K\\ 2 }U{U nJ l Bn > eLc s s \\K\\ 2 }. 

Then, in view of the lemmas Q and 03 it is easy to find D 2 > such that for any 
/ G S^(s,-L) and n large enough, 

F n f j£*fl An > (1 + 2e)P} < exp ( - D 2 e(l A e) log n) . (4.13) 

Using lemma 01 we obtain 

£*fU n < « /(2s+1) (l + (4.14) 

where VI^ 1 ) = ma^o<p<(ifc+i)x|Ji „| |Wp| an d ^3 = C'lMloo • Since W is a cen- 
tered Gaussian vector suc h that E" {Wp } = 1 f or ^ p < (fe + 1)M„ it is well 
known (see for instance in lLedoux and Talagrandl (|l99lft ) that 



< V 21 ° g((fc + 1)1 = °(Vloglogn), 

since \J\, n \ = O(logn), and that for any A > 0, 

P^{ W W -E^{W«} > A} < 2exp(-A 2 /2). 
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Then, when n is large enough, 

n^npAn > ^P) < V^{W^ -E^WM} > eP6- s '^ 

^ 2 exp ( - e 2 P 2 £- 2s /( 2s+1 V(2^)) . 

The same result holds for S^f- Hence, together with (|4.13[) . for a good choice of 
D £ we obtain (jO|l . It is easy to prove (|4.10j) from (jO|l . For any / G and 
p > 0, when n is large enough, 

r+oo 

El^jfuj = P / t^l^ f i An > t}dt 

Jo 

^{2Pf+pe D£ / t p - 1 exp(- D £ t/P)dt = 0(l), 

thus (jmnj), since w(sc) ^ A(l + |x| b ). □ 

Proof of theorem^ Let x G [xj, Xj+i). Since /i G S(z/, £?) with < z/ ^ 1 we have 
clearly ^ s /( 2s +!) g S(si//(2s + 1), £» s ^ 2s+1 ^) and using assumption iDl 

sup Ir^^)- 1 - rj 1 ] < rj 1 m s/(2s+1) A W(2*+i) = o(1)r ~l (4 15) 

xe[xj,xj+i] y 1 / 

Since / G T@{s, L), writing the Taylor expansion of / at x G [x-,-, 2j+i) we obtain: 
\f n ( x )-f( x )\^\f n ( Xj )-f( Xj )\ 

k / \Tfi 

771. 

m=l 

and in view of (|4.15|) . 

k 

£ nJ < (1 + o(l)) + rnaxr" 1 £ |/( m )(x,) - /M^-Jl) + 0(£). 

m = l 

We consider the event .A n from lemma Since .A n C C n we have that on A n , in 
view of lemma |U and for any 1 ^ m $C k, 

maxr7V( m) (^)-/ (m) (x,)|^- 

< (1 + (l))CllMll^ (2s+1) C(l + (logn)" 1 / 2 ^), 

and then 

fnju* < a + «(i)X/U n + o(i)s n (i + sy 2 w M ) + (i). 

We define W n 4 {|W M - E^{W M }| < Since W M = max ^(fc+l)M n |WJ,|, 

we know in the same way as in the proof of proposition [21 that „{W } 
V21og((A; + l)M n ) = (9(^ 1/2 ) and 

^{^}^2exp(-,5- 2 /2). (4.16) 
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Thus 

£njU n nw n < (1 + o(l))££ f l An + (4.17) 
and since w is non-decreasing, we have for any e > 

E n f Jw(£ n , f )} 

< El^{w(£ n j)l AnnW J+El^{w(£ n j)l A c nUW c} 

< «,((1 + 2e)P) + (E^K(f n)/ )}P^{^ U W^}) 1/2 

+ (E^{^ 2 ((1 + 2 £ )< / l^)}P£ M {< / l Ai > (1 + ,)P}) 1/2 

< w((l + 2e)P) + 0( n b ( 1+s /( 2s+1 » exp(-(log n) 2 /4)) 

+ 0{exp(-D £ e(l A e)(log n) 2s ^ 2s+1 ^)) = w((l + 2e)P) + o(l), 
where we used proposition^ lemmas El and the fact that w is continuous. Thus, 
limsup sup E^{w(£ nJ )} < w((l + 2e)P), 

" /eS<3(s,L) 

which concludes the proof of theorem ^ since e can be chosen arbitrarily small. □ 

Proof of proposition We consider the event W n defined in the proof of theorem 
Since A n C B n C C n j for any j £ J n we have 

(1 - o(l)) rj < P n (xj) < (1 + o(l))rj (4.18) 

on ,A n . In view of Q4.15[) and (|4.17f) we have for any j £ J n , x G [ajj, a-j+i) on 

Rnixf^Mx) - f(x)\ = r ^^r n ,^xr 1 \fn(x) - f(x)\ 

< (1 + (1))£„,/ < (1+0(1))^ + 0(1). 

Thus, if ^" n ,/,/3 = { supa-gp^j P n (x) _1 |/n(^) - 70*01 ^ (1 + /?)P} lemma[71 proposi- 
tionEland (jUBj) entail for any / G £ Q (s,L), 

< P^{</l^ n > (1 + 0/2)P} + ^IMn U W n} 

< exp(-D c p{2 A/3)(logn) 2s /( 2s+1 )), 

for a good choice of Z) c . When n is large enough, the choice = j3(n,a) makes 
the last part of the above inequality equal to a, hence (j!.13|) . Using again (|4.18|1 . 
lemma [7| and (|4.15|) it is easy to obtain (|1.14l) , □ 

4.4. Proof of lemmas El 03. SI El El and \7\ Since b n j and U n j only depend on / 
via its values in [0, 1], we have 

sup b n j = sup 6 n j, sup U n j = sup U n j. (4-19) 

/eS(s,L) ' /e£(s,L;R) ' /GS(s,L) ' /e£(s,Z,;R) 

Here, it is convenient to introduce Pj = Yli=i(.f(-^i)~f( x j))^i,j an d Qj — Yli=i 
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Proof of lemma\B On A n jnC n j we have (1 —o{l))q-j ^ Qj ^ (l + o(l))qj and since 
B n C A n j n C n j for any j G .72,™, we have 

IV^-I =r7 1 |E^ t {(P J /Q,)lBj| < (l+o(l))(r J g J )- 1 |E^{ J P,-l B J|. 

Recalling that K = ip s / J ip s with (p s G £(s, 1; M) we have for any i,t/6R 

|K(x)-K(y)K K |x-y| Sl , 

where si = s A 1 and k = (J </? s ) _1 when s G (0, 1] and k = Hif'Hoo when s > 1. 
Since Supp K = [— T S ,T S ], we have for n large enough on B n : 



si 



si 



llXi-xjl^CsTsiHjVhj) 



(4.20) 



^ KT ^ ( 1 _ £ ) ^-Xjl^CsTsil+S^h, = 0(1)1 



M 



where My = {\Xi — Xj\ ^ c s T s (l + 5 n )hj}. We introduce Vf^(x) - 
1 f(x)<f(x j ), Rid = (f( x i) ~ f( x j)) K i,j, s i,j = Vf,j( x i)U( x i) ~ f(xj))^M ld , Rj 
E"=i R i,j and S j = E?=i Then > 

4 |E ^ l - } i 

^— ClE^^-ll + oa)!!^}!) 



i 



{f{xj + yc s hj) - f(xj))K(y)n(xj + yc s hj)dy\ 



+ o(l)| / (/(xj +yc s hj) - f(xj))uf tj (xj + c s yhj)n(xj + yc s hj)dy\), 



'b|sS(l+<5n)^ 

and since /u G g) we have 



+ / l/fcj + yc s /ij) - f(xj)\dy. 
Using (|4.19f) and the fact that E(s, L;R) is invariant by translation, 

sup b n jj < (1 + o(l)) sup max — (| / (f(c s hjy) - f(0))K(y)dy\ 

+ o(l) [ \f(c s h jy )-f(0)\dy). (4.21) 

Now w e use an argument which is known as renormalisation, see lDonoho and Low 
(1992). We introduce the functional operator U aj bf(-) = af(b-). We have that 
/ G S(s,L;R) is equivalent to U a ^f G E(s, Lab s ;R). Then, choosing a = (Lc*/i^) _1 
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and b = c s hj entails 



sup b nJ ^(l + o(l))Lc s s B(s,l) + o(l) sup ( \f(y)-f(0)\dy, 

/G£(s,L;R) /e£(s,l;R) .%|^2T 

where B(s, 1) is given by l)A.2|) and where we recall that rj = hj. We define fk(y) = 

/(0) + f(0)y + ■■■ + f^(0)y k /kl Since / e E(s, L; R), we have / - f k e E(s, L; R) 
and finally 

sup 6 nJ < (i + (i))Lc^( s ,i) + o(i) / □ 

/SS(s,L;R) ^M<2T 

Proof of lemma\Q We recall that C/ n ,/j — r J 1 {^j~^ r j ^{^j^-Bn})- We use the same 
notations as in the proof of lemmaOl On B n we have (1— o(l))qj ^ Qj ^ (l+o(l))<fy-, 
and since E 5 {Pj*} ^ 4Q 2 ||liC||^n 2 we obtain in view of lemma [7| 



Then, it is easy to see that on B n , 

\UnJjl < -^-((1+0(1))^ -V n fJPj}\ + (l)|E^{P,l fln }|) +o(l), 

and we know from the proof of lemma that 

— lEl^PjlBjl < sup max J-|E^{P,-l fl J| < (1 + o(l))Lc^(s, 1), 

thus |f7„,/,j| < (1 + o(l))(r j g i )~ 1 |P j - E^{P,}| + o(l) on B n . From the proof of 
lemma |U we know that (rjf/j) _1 |E ^ {Sj}\ = 0(1), and using (|4,2U|) it is an easy 
computation to obtain that on B n , 

IPj-El^Pj}] < \R 3 -E14R,}\ +o(l)\S 3 -E^{5,-}| + (1)|E^{5,}|. 

Then we have for n large enough 

+ P^{|^-E^{S J }|>^}. 

We use Bernstein inequality to the sum of variables Rij = R{j — E^ ^{Rij} and 
5*ij — Sij— E^ 1 ^ i ^ n. The variables (Pij)i-^n are clearly independent, 

centered and satisfy |Pj ,-| ^ 401£ oo . In view of Q4.19JI and since /i E S„(j/, g), it is 
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easy to prove with the same arguments as in the end of the proof of lemma 121 that 
< (1 + oitychjiij [ (f( Xj + c s h jy ) - f{ Xj )) 2 K 2 {y)dy 



^ (1 + o(l))c a hj[ij sup / (f(xj + c s hjy) - f(xj)) 2 K 2 (y)dy 

/G£(s,L;R) J 

< (1 + o(l))L 2 ( Cs / lj ) 2s +V J sup / (/(y) - /(0)) 2 K 2 (y)dy 

/SS(s,L;R) J 



< (l + (l))^(c s ^)^ + Vi y y zs K\y)dy/(kX)\ 

Then ^/,u{-^fj} = 0{r 2 qj) and the Bernstein inequality entails that for n 

large enough, there is a constant D 4 > such that 

F} ifl {\Rj -E^iRj}] > < 2exp(-D 4 e(l A e)n s ^ 2s+1 *>). 

The variables (<Sij)i^i^ n are independent, centered and such that \Sij\ ^ 4Q, and 
in the same way as previously we can prove Y17=i ^///Wij} = 0{r 2 qj). Using again 
Bernstein inequality, it is easy to find D5 such that 

" E /,^}l > e W3} < 2exp(- J D 5 e(l A e)n s /( 2s+1 )), 

and since \J2,n\ ^ -^n; we have for any / E £^(s, L), 

< 4M n exp ( — (D4 A D 6 ) e(l A e)n s/{2s+1) ) . 

Since 4M n exp(-(£> 4 AD 5 )e(l Ae)n s /( 2s+1 )/2) goes to as n goes to +00, the lemma 
follows with Djj = (D 4 A D 5 )/2. □ 

Proof of lemma^ We take / = I(x,h) for some a; € [0, 1], h > and define the 
vector 61 with coordinates (6i) m = f^ m \x)/m\ for ^ m ^ fc. Since X/ = X/ on 
^n,J 5 we have Aj 1 (9j — 6j) = QJ 1 Aj'Kj(0j — 9i). If //(y) = -Pe 7 (y — x), we have in 
view of (|4.2j) for any ^ m ^ fc: 

(X/(?/ - 0j)) m = (// - // , /jW )j = {Y — fi, /jm )/ 

= (/-//) 0/,m)/ + (£ , <£l,m)/, 

thus X/(0/ - 0/) = B/ + V/. Since / E S(s,L), 

(AjBi) m < H^mllJ^/ - // , 0/,m)/| < ||/ " Mil < Lh'/kl, 
then we can write 

where u E is such that |M|oo < 1 and 7/ = (ay^^i^-^^A/D/^ = T/£, 
where D/ is the matrix of size np, n (I) x (fc + 1) with entries (D/)j )I7l = (JQ — x) m , 
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so that X/ = (n/x n (I))~ 1 D / / D/. Since TjTj = a~ Ifc+i, we obtain that 77 is, 
conditionally on centered Gaussian with covariance equal to Ife+i- 

Consider / = I(xj,h) for some j £ J n , h > 0. From the inequality || • ||oo ^ 

|| • || ^ y/k + 1|| • Hoc and since [| ^ V / AT+T||<7^ 1 1| (Qj is symmetrical with 

entries smaller than 1 in absolute value) we get 

\\Aj\9j " 0l)||oc < II^^Hoo + r ^ T - \\Gi 1/2 ll\\oo 
< [I^IKfc + l){Lh S + ^— h/lloo) 

= \-\Qi){k + l)(Lh s + max |W r (fc+1)i+m |), 

where W = (7/(xo,h)> • • • > 7/(* Mn ,h))'- KT= (T J(a , 0jh) , . . . , T J(a , Mn>/l) )' we have W = 
T£, thus is a centered Gaussian vector and for any (k + l)j ^ m ^ (k + l)j + fc, 
j £ J n we have 

E 'f^{ w m} = (Var{Ty}) m , m = (Var{7 /(l . j ^ ) }) m _ (fc+1)iim _ (fc+1)j = 1, 
since Var^/^^)} = I k+1 . Then, we have proved that on r\j e j n Sl n>I ( Xj , h ), 

™^ ll A 7(* i ,/i)(^(iri,h) ~ 0J(sy,/o)lloc 



where VF M = max ^ m ^( fc+1 )| j\ \W m \. Since C n C N n n Q n n £ n , we have on C n for 
h = h n or h = t n , 

< (l + (l))A- 1 (e?)(fe + l)(L/ l s + ^^VF M ). 
Since C n C 2? n , we have for any j £ J n , ^ m ^ fc, 

C n C D n ,2m,/(x j ,/i„),5„ H D n)2 m,/(a; J -,tn),(5n' 

thus on C n , when h = h n or h = t n , we clearly have 

( A /(^,h))m, m = 1 1 ,ft),m 1 1 7(^,^1) ^ (1 + o(l))h' m V2m + 1. 
Since fn m \xj) - / (m) (xj) = m\{(6 I{X]M ) m - {9 1{x ^ hn) ) m ), it follows that on C n : 
l^ m) (^i)-/ (m) (^)l 



< (1 + o(l))A- 1 (a)m!V2m + l(fe + l)h~ m (Lh s n + - ^ TV 

< (1 + o(l))C7L/ i r m (l + (logn)- 1 / 2 ^), 

thus H4.4|) . Inequality (|4.5|) is obtained similarly. □ 
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Proof of lemma\^ If Jx n {I) = we have 9i = and the result is obvious, thus we 
assume fl n (I) > 0- I n this case, A/, X/ and Qi are invertible, and by definition of 

Oh 

h = AiAJ% = Aj^AjX/fl/ = A / g7 1 A / Y / = Aj^Bj + V/), 

where (B/) m = ||0/, TO ||7 1 (/ , </>j, m )j and (V/) m = H^/.mllj 1 ^ , <j>i,m)i- Since ||/||oo < 
Q we have |(B/) m | = \\(pi, m \\j 1 \{f , 4>i,m)l\ < 11/11/ ^ Q, thus HB/Hoo ^ Q. 

Conditionally on 3t n , V/ is centered Gaussian and it is an easy computation to see 
that its covariance matrix is equal to a 2 (nfl n (I))~~ 1 Aj'KjAj. Then AiQj l *Vj is con- 
ditionally on 3£ n centered Gaussian with covariance matrix o~ 2 {ji\i n (/) ) ^■^■j • 
If e m is the canonical vector with coordinates (e m ) p = l p=m , we have 

= |(0j, e m } \ = \{A J gj 1 B I ,e m )\+aVk + l'r, 

where 7 = (a\/k~+T)~ 1 (AiQy l \ i , e m ). By definition, we have HX7 1 1| = A _1 (Xj) ^ 
\/nfl n (I), and clearly ||X/|| ^ k + 1 and ||AJ 1 || ^ 1. Then, conditional on ~fc n , 7 is 
centered Gaussian with variance 

(e m , X 7 1 X/X / 1 e m ) ^ ||Xj 1 || 2 ||X/|| ^ 
(fc + l)n/in(/) ^ (fe + l)n/2 n (7) ^ 
Since ll^f 1 !! < ||A7 1 ||||X7 1 ||||A7 1 || < y/nj2 n (I) and (A/) ,o = 1, 

we have 

E^{|(W|£„} < (A; + l) p /W 2 (Q V l) p E^{(l + a\ 7 \r\X n } = 0(n"/ 2 ), 
for any I C [0, 1], and since ||A/|| ^ y/n on F n j, it follows that 

E^IK^kn^} < (k + lf/V(Q V l) p E^{(l + a\ 7 \y\x n } = OK), 

for any 1 ^ m ^ fc. □ 

Proof of lemma\& We show that for any p > 0, 

sup E n f J£% f } = 0(n p ( 1+s /( 2s+1 »), (4.22) 
/esQ( s ,i) ' 

which entails (|4,6|) . By definition of H n (x), we have H n (x) ^ (log n/n) 1 ^ 2 ^ for any 
xG [0,1]. Since ||/||oo < Q, 

we have for any j G iJ2,ni 

\fn(x j )\^6- 1 (n/logn) 1 ^ 2s \Q + |£n|/v^)||tf s ||oo, 

where £ n = Ya=1^/V^ * s standard Gaussian. Then, 

E^{l/n(^)l p |X„} < «(n/ log n)^ 2s )(Q V 1) P E^{(1 + |4iri^n}||^l|oo 
= ( n P/(2,) (logn)P (i-i/(2s)) ) _ 

When j G J n ,i U 1/71,3, we have f n (xj) = di( X j,t n ) ana - m y i ew °f lemma[51 

E^{|/ n (x,)nX n }=OK/ 2 ). 

For any j G J n , since = m!(^, M ) m , we have in view of lemmaEthat 

on T n j( x ^ hn ), 
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for any 1 ^ m ^ k. Then, we obtain that for any ||/||oo ^ Q, 

£ n j = 0((n/logny^ s+ ^){ sup \f n (x)\+Q), 

xe[o,i] 



and since 



k I 7{m) i \ I 

sup \f n (x)\ < max (\f n ( Xj )\ + ( £ ^P)l r „ J(l ,J = OK 



xe[o,i] v m! J 

thus (Ji^2l and (f4~o]) . □ 



Proof of lemmaffl The proof is divided in several steps. We recall that qj = nc s hj/ij 
and q~j = nc s Hjfj,j. 

Step 1. We prove that for any j £ Ji,n and n large enough, 

P£{B^.} ^ 2e X p(- J D 1< 5 2 n 2 */( 2s + 1 )), (4.23) 

where D\ is a positive constant. Consider the sequence of i.i.d variables = 
Kij — E"{i£"jj}, 1 ^ i ^ n. Since /j 6 S g (z^, ^) and J K = 1, we have for n 
large enough \w^{K Xd }/ qj - 1| < 5 n /2, thus B^. C {| E2=i CijIAfc < ^ n /2}. Since 
iCijI < and for n large enough £" =1 E£{C^} < i 1 + <$n)g# / the Bern- 

stein inequality entails (|4.23|) . 

Step 2. We prove that for any j £ i7n,2) 

W-nC^} < 2exp(-Z? 2 5 2 n n 2 ^ +1 )), (4.24) 

where -D 2 is a positive constant and £2,71 = «i = s A 1. In view of (|4.20j) . we 
have on C n j 

\K id - K id \ < «Tf (-^L-) Sl l Mij (4.25) 

where we recall that Mjj = — a?j| ^ c s T s (l + 5 n )hj}. We define r^j = 1m^ — 
P"{Mjj}. On C n j we have for n large enough 2c s T s H^f ^ 5 n , and since S 
p~nj 1 7"n] ! 

Xj ^ 1 - r n = 1 - 2c s T s H™ ^ 1 - 2^1^ 

^ 1 - 2c s T s (l - 5 n )^- < 1 - c s T s {l + £ n )fy 

for n large enough. On the other hand we have similarly Xj ^ c s T s (l + 5 n )hj. Thus, 
since \i £ Ti q (u, q) we have 



,J/ 2T a 



(1 + 5 n )c s hjHj 



^ - f \n{ Xj + c s y(l + - = 0(h u n ). (4.26) 

9 %Kr 
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Since Xj S [c s T s (l + 5 n )hj, 1 — (1 + 5 n )c s T s hj] C [c s T s hj, 1 — c a T s hj], we have for n 
large enough on C n ,j, 



i 



h ; 



— t 3 — I \K(y)\\n(xj +yc s hj) - fij\dy + 



Hi 



1 



(4.27) 



l-5„ 



Then, combining (|4.25l) . (|4.26j) and (|4.27() we obtain that on C n j and for n large 
enough, 



1 n X 

Hi ■ A x u n 

< E^l + 1± r iil i Ecyl + 2 ^ TS1+1 + 



and taking L\ = 4(kT Si+1 + 1), we obtain 



i=l 



i=i 



Then, applying Bernstein inequality to the sum of variables rjij and Qj, 1 ^ i ^ n, 
we obtain (|4.24j) . We can prove 

n C nJ } < 2exp(- J D 3 5 2 V 2s/(2s+1) ) 5 (4-28) 

where -D3 is a positive constant in the same way as for the proof of Q4.24JI with a 
good choice for L 2 . 



Step 3. We define the event 



n — 

'-'n,m,I(x,h),5 



I(x,h) 



<fil(x,h),m dUn ~ Xr, 



H(x)h m+1 

and we prove that if S 1>n = 1 - (1 + 5 n )-( 2s+1 \ 

D fl| o,J(a! i) (l-(yn)Aj),<5x,n n D n,0,/(ay,(l+a„)/i f ),5i,„ C C nj -. 
From the definitions of iifj and hj (see section IT!!)) we obtain 

{(1 - 6 n )hj < Hj} = {(1 - 5 n ) 2s hf < log n/(nfl n (I(x 3 , (1 - S n )hj)))} 
rfin(I(xj, (1 - 5 n )hj)) 



(4.29) 



^(l-5„r( 2s+1 )}, 



and then 



I - <5 n )/ij 
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We can prove in the same way that on the other hand, 

D n,0,/(%-,(l+ 1 5„)ft f ),5i J n c it 1 + S n)hj ^ Hj}, 

hence (|4~2l7J) . 

^. We prove l|4.8|l . If 53 ira = <5 n /(2 — <5 n ), we clearly have for any interval /, 

^>n,rn,I,5 3 ,„ n ^n,0,I,5 3 ,n C D n,m,/,(5„. 

Using the fact that A(M) = inf imi^ (a; , Mx) for any symmetrical matrix M and 
since Gi, G, X/ are symmetrical, it is easy to see that 

fl {\&-GU\<7f^}t£n,I, (4.30) 



(k + lf 

0^p,qs^k 



and that 

2k 



n s ,^ c n {K x '- x wi< (t+1)2 

m=0 1 + ; 0f^p,q^k V ; 

c{|A(Xj)-A(X)| 

Recalling that if I = I(xj, h), 

{<t>I,p , <t>I,q)l Hjh^ 1 fi d P"n 



(^Qj^j 

||<Ml||<MI/ " ^^TT Jj <f>I,2 P dUn^^^T Jj h^djn 

it is easy to see that if S^ n = 5 n j ((2 — S n )(2k + l)(k + l) 2 ) , 

Dn,2p,/,<5 4 ,n n D n,2g,/,5 4 ,n n ^n,p+q,I,S 4 , n C _ G)p,q\ ^ 7^ + "g2 }' 

thus 

2k 



m=0 



and clearly for n large enough, if / = I(xj, h n ) or I = I(xj,t n ) 



fl ^n,m,iM, n C {|A(X X ) - A(X)| ^ <U n {|^^ - l| < <5 n } C (4.31) 



m=0 



Moreover, if / = I(xj,h n ), we have on T) n ^m,i,5 n for any 1 ^ m ^ A; and n large 
enough, 

ll^.mll/ > (1 - o(l))/i™V2m + l > 1/v^. (4.32) 

We define 

Dn,m = P| ^D n>mj /( a .^ ) / ln ) ) ,5 5in fl ^n,m,I(xj,t n ),6s, n 

n D n,0,/(o; :( -,(l-<5„)/ij),55, n n D n,0,/(x- J ,(l+5 n )^),<5 5in J > 
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where 5, 



5,n 



Or? — n 



and we choose 



An = D n n A n n B n n E„. 

In view of (jOHJ), dUSOJ), (|4~5TT) . (|03Jl we have A n C C n n n n £„ n T n and since 
D n ,o,J,i5 = we obtain 1)4.81) , 

iStep 5. We prove (|4.7|) . Using Bernstein inequality, it is easy to show that for n 
large enough, if h = h n , h = t n , h = (1 — <5 n )/ij or /t = (1 + 5 n )hj, 

Wm,/(W < 2exp(- J D 4 4 n n/ l ) < 2exp(-A 5 n s/(2s+1) ), 

with 1)4,1)5 positive constants, where we used the fact that <5| n n s /( 2s+1 ) > 1 for n 
large enough and nh ^ D 6 n 2s ^ 2s+1 \ In view of Q4.29J1 we have D n C C n , hence 

f;{A c J ^ P^{D£ } + F]JA c n n C n } + P£ M { B n n C n } 
+ P^{E n nC n } + 3P^{C n } 



^4 



P/,^{Dn} + P/,^{A^ n c n } + P^{B£ n c n } + P^{E n n c n } 



<C 2(8A; + 7)M n exp(-2D A n 



^ exp(— Djsji 



3/(23+1)^ 



for n large enoug h, where D A = {D x V£> 2 V£> 3 VD 5 )/2, where we used (g23J), 

and (gHHI). □ 

5. Proof of theorem 2 

The proof of t he lower bound is heavily based on argu ment s found i n iKorostelev 
( 199.^ 1 . IPonohol (Il994h . Ikorostelev and Nussbaum 1 1999T) and lBertinl \2QQ4(h . It is 
mainly a modification of the former proof in lBertin ( 2004cf ) . It consists in a classical 
reduction t o the Bayesian risk over an hardest cubical subfamily of functions, see 
for instance iDono ho (1994). The main difference with the former proofs is that the 
subfamily of functions depends on the design via the bandwidth h n ^(x), which is 
adapted to the local amount of data. 

5.1. Preparatory results. We begin with some definitions. We recall that ip s is de- 
fined by 1)1.6(1 and that it has a compact support [—T S ,T S ]. Let = max xg / n h n ^{x) 
and 

E n = 2T s c s (2 1 /(^) + l)h n . 
If I n = [a n , b n ], M n = \\I n \ H~ ], we define the points 

Xj — CL n + j s n , j 6 Jj n — { 1 , . 

In order to unload the notations, we denote again fij 
Lemma 8. Let define the event 



,M n }. 



(5.1) 



1 n 

H n ,j = \ 7 / , 



X; 



c s hj 



s$ e 



}■ 



and H n = fl,- 



We have 



lim P"{H n } = l. 



c s hj 
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Proof. We use Bernstein inequality to the sum of variables ip 2 s ({Xi — Xj)/(c s hj)), for 
1 ^ i ^ n, where we use the fact that ||y s ||2 = 1 (see section^) and we derive a 
deviation inequality for the events •. Then, bounding from above the probability 
of Uj 6 j- n H£ by the probabilities sum, the result follows easily. □ 

The subfamily of functions is denned as follows. We consider an hypercube C 
[— 1, 1] ™, and for ff£6we define the functions 

/{■>■: 0) = "jIM'^ fi( x ) = '<I>1^ 

Clearly, fj G T,(s,L). Let us show that /(• ; 9) G £(s,L). We note that 
Supp (tp s ( - = [xj - c s T s hj, Xj + c s T s hj] = Ij. 

If x,y G Ij then f(x;9) = 9jfj(x), f{y;9) = 9jfj{y) and the result is obvious. To 
complete the proof, it suffices to consider the case x G Ij and y G Ij+i- In this case, 
we have 

\fW(x;9)-fM(y;9)\ 

= \9 j ff\x)-9 j+1 f\%(y)\ 

< \ff\x) ~ ff\xj + c s T s hj)\ + \f$%(x j+ i - c s T s h j+1 ) - f(%(y)\ 

< L(\x - Xj - c s T s hj\ s ~ k + \x j+1 - c s T s h j+1 - y\ s ~ k ) 

< L{{2c s T s h,) s - k + (2c s T s h J+1 ) s - k ) ^ 2L{2c s T s h I n Y~ k ■ 
Moreover, since x G Ij and y G Jj+i we have 

\x-y\^ Xj+i - Xj - c s T s {hj + hj+i) ^ E n - 2c s T s h I n = 2 1/ ( s ~ k \2c s T s h I n ), 
and finally 

\f^(x;9)-f^(y;9)\^L\x-y\ s - k , (5.2) 
thus /(• ; 9) G X(s, L). For any j G Jn, we define the statistics 



Vj 



Lemma 9. Conditionally on 3C n , the yj are Gaussian and independent. Moreover, 



if vj = E ^ ^{yjl %n}> we have on H n j 

^luiVi&n} = Oj, , 2S \] < v 2 < . 2S \] . (5.3) 

t,n\yj\ nj j, 2 (l + e)logn 3 2(1 -e) log n v ; 

In the model Ql.l|) with /(•) = f(-;9), conditionally oil 3C n;> the likelihood function, 
of (Yi, . . . , Y n ) can be written on H n in the form 

dF /,„ „, A TT 



^| Xn (y 1 ,...,y n ) = n^(^) n 



where g v is the density of J\f(0,v 2 ), and X n is the Lebesgue measure overW 1 . 
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Proof. By construction the fj have disjoint supports, thus it is easy to see that 
conditionally on 3£n the yj are Gaussian independent with conditional mean Oj. 
Using the definition of H n and since 



^2 



it is an easy computation to see that on H n , we have ()5.3() . The last part of the 
lemma follows from the following computation: 

9vj{yj ~ Oj) 



n^(^) n 



9vj (yj 



n ex p ( - ^ 2 /(2a 2 )) n (WW - %)/(^])) 

v ; i=l jeJn 

" r ( >f • >:,./ (2Wi(*0 ojfAX,)^ 

exp ' 



' If 

a n (2ir) n / 2 11 
v ' i=l 

v 7 i=l 



2a 2 

2 v dP 1 ,' 



□ 



5.2. Proof of theorem [2J We denote in the following E = E(s, L) and S^fT = 
su Pxei r n,ii{x) ~ 1 |T(sc) — Since w is nondecreasing and f(-;9) £ £ for any 

6 E O, we have for any distribution 6 on 8 by a minor ation of the minimax risk by 
the Bayesian risk, 



inf supE<^>« /iT )} > - e)P) inf supP^{4 /jT > (1 - e)P} 

> - £)P) inf / P^{</,T > (1 - 

•/ 

where Pg = ^- Since by construction /(a^-; 0) = rjOjP and Xj G J n , we obtain 

inf / P^ )/iT ^(l- £ )P}S(^) 
T Je 

> inf / / P£{ max |% - ^| > 1 - e|£ n }dP™£(cifl), 
5s / inf / P£{ max \6j - 9j\ ^ 1 - e |£„}B(d0)dP" 

where inf^ is taken among any measurable vector (with respect to the observa- 
tions ) in M. Mn . Then, theorem |2] follows from lemma El if we prove that on 
H n , 

inf f P£{ max|% - 0j\ > 1 - e\X n }B(d6) ^ 1 - o(l), 
6 Je i^3 n 
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or equivalently, that on H n 

sup / P£{ max \6j - 9j\ < 1 - e\X n }B(d9) = o(l). (5.4) 

To prove (j5.4|) . we choose 

e = ef-, e £ = {-(i- £ ),i- £ }, s=(g)& £ , 6 e = i(<5_ (1 _ e) + <y 1 _ 6 ), 

where 5 stands for the Dirac mass. Note that using lemma 03 the left hand side 
of (|5.4|) is smaller than 

n" =lg ; ( ^\ ( II su p / ift-^Ki-^iCw- e j )db e {e j ))dY 1 ...dY n , 

and an easy argument shows that 

#j = (1 - e)1 w ^o - (1 - e)l w <0 
are strategies attaining the maximum. Thus, it suffices to prove the lower bound 
among estimators 9 with coordinates 9j € e and measurable with respect to yj 
only. Since the yj are independent with distribution density g v . (• — 9j), the left 
hand side of (|5.4|l is smaller than 

II ™ ax / / 1 |e J K)-e J |<i- E ^ J (%-^)^^(%) 



II ( 1_ > f / 1 \e,(u)-e J \^i- £ 9v J {u-9 j )dudb £ (9 j )), 



and if $(aj) = gi(t)dt and Z?i is a positive constant, 



> f / / 1 \e j (u)-e j \^i- £ 9v j (u-9 j )dudb e 

> ~ inf o / ( X e >o + 1 e <o)^( u - (! - £ )) A ^ (« + (1 - e))du 



Vlogn 



where we used lemmaEland the fact that for x > 0, $>(—x) = ( 1 +°( 1 )) c ^£( g / 2 ) . # 

v ' X\J2-K 

follows that the left hand side of Q5.4JI is smaller than 

(l _ gl n -(l-e) 2 (l+ £ )/(2s+l)\ M " 

< exp (|I n | H; 1 log (1 - Dan-^-^^/^JClogn)- 1 ^)), 
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and if D 2 is a positive constant, 

= D 2 \l n \n £ /( 2S+ V X n eHl-e)/(2s + l) {logn) -l/2-l/(2s + l) 

as n — > +00, since \I n \n £ I <y2s+1 ^ — > +00, thus the theorem. 



+00 



□ 



Appendix A. Well known facts on optimal recovery 

A.l. Explicit values. To our knowledge, the function ip s is only known for s £ 
(0, 1] U {2}. We recall that the optimal recovery kernel is defined by 



K, 



where ip s i s given bv (11.61) . The kernel K x for s £ (0, lj was found bv iKorostelevi 
1993h and lFullerljl96lh fo r s = 2. See also lLeonovl (jl997l . ll999h . lLeDski and Tsvbakovl 
2000!) and lBertinl (|2004blh When s £ (0, 1], 



K s (t) 

where x + = max(0,rc), and 
When s = 2, we have 



a + 1 
2s 



(2s + l)(a + l) \ s /(2s+i) 
is 2 J 

p.(t) = ^- 2/5 52 (^- 2/5 t), 



where for i 

»w = E ((- w + - %) 2 )^[%- 1 ,% + x] 



3>0 



g= ^(3 + v 7 ^- ^26 + 6^/33 ) 2 , 

2(23q 2 - 14g + 23)^/1^9 
30(1 - g 5 / 2 ) 

and t-! = t = 0, h = VT+~q and for any j £ N - {0}, t 2 j = 2vT+gX)£o ^ /2 ' 
t2j+l = t2j+q^ 2 y/l + Note that <^ 2 is piecewise quadratic and infinitely oscillating 
around at the boundaries of its support. For these values of s, 



P = P, 



~2^ 



1(1) 



when s £ (0, 1], 
when s = 2. 



In figure El we give an illustration of the kernel K s for s = 1/2, s = 1 and s = 2. 
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Figure 3. Optimal recovery kernels K s for s = 1/2, s = 1 and s = 2. 



A. I. (Jptimal recovery , ine next results are w ell kno wn an d ca n be lound m 
Ponohd (fl994h . lLeonovl (|l 997Lh 999h . lLer.ski and Tsvhakovl (|2nOCh arid lReTtr^l ()2nn4hh . 

The problem consists in recovering / from 

y(t) = f(t) + ez(t), t€R, (A.l) 

where e > 0, z is an unknown deterministic function such that [|z||2 ^ 1 and 
/ e C{s,L;R) = E(s,L;R) n L 2 (R). This problem is well known, and the link 
between this problem and the statistical estimation in sup norm in the white noise 
model 

dYf = f(t)d t + edW t , te R, 

was made by iDonohol (dUi), see also iLeonovl (|l999^ . The minimax error for the 
problem of optimal recovery of / at in the model ()A.1|) is defined by 

E s (e, L)4inf SU p |T(y)-/(0)|, 

T /eC(s,L;R) 

\\f-vh<? 

where infr is taken among a ll continuous an d linear forms on I 



wnere mlr is taken among all continuous and lint 
Micchelli and Rivlinl (Il977h . lArestovl (Il99(t ) that 

K(t)m ~ /(0)) 



We know from 



E s (e,L)= inf 

K&L 2 { 



sup 
feC(*M 
sup /(0). 

/SS( S ,L;R) 
Wfh^e 



+ e\\K\ 
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Note that <p s satisfies f s (0) = E s (l, 1). For any s > 0, we know from iLeono v (1997) 
that tp s is well defined and unique, that it is e ven and compac tly supported and that 
||y s ||2 = 1- A renormalisation argument from Donohol (|l994l ) shows that 



thus it suffices to know E s (l,l). If we define 

B(s,L) 4 SU p / K s (t)(f(t) - /(0)) , (A.2) 

f€C(s,L;R) J 

we have the decomposition 

E S {1,1) = B(s,l) + \\K\\ 2 , 
and in particular if P is given by (|1.5() and c s by we have 

P = Lc s s (B{s,l) + \\K\\ 2 ), (A.3) 
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